Ethan Fosse
September 21, 2017
Research Associate, Department of Sociology
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Teaching Staff
Faculty Sponsors
MyFirstScript.RStates.RDataStatesHealth.dtaMyFirstMarkdown.RmdDid Obama or McCain win Ohio in 2008?
We'll use data to answer this question!
States.RData into our workspaceUsing RStudio's user-friendly interface:
C:/Folder/)You can also try this R Code:
load(C:/Folder/States.RData)
load() is a functionEnvironment tabclass() to see what we have!class(States)
View()View(States)
R functions: print(), head(), tail()
print(States)
head(States)
tail(States)
print() function?head() and tail()?str()str(States)
dim(), nrow(), ncol()dim(States)
nrow(States)
ncol(States)
rownames() and colnames()rownames(States)
colnames(States)
View(States)
The 1936 Election was a landslide (46 versus 2 states won):
Was 2008 a landslide? How many states did Obama win versus McCain?
Summarizing a data set using summary()
Try this R Code:
summary(States)
NE)?$) notation:Dataset$VariableStates$HouseholdIncome
States$Region
summary() function on single variables!summary(States$HouseholdIncome)
summary(States$Region)
class() and str()class(States$HouseholdIncome)
class(States$Region)
str(States$HouseholdIncome)
str(States$Region)
<-) to copy a data setNewDataset <- DatasetStatesCopy <- States
str(StatesCopy)
class(StatesCopy)
StatesCopy the same as States?<-” to create a standalone variableNewVariable <- Dataset$VariableHouseholdIncome <- States$HouseholdIncome
str(HouseholdIncome)
class(HouseholdIncome)
HouseholdIncome the same as States$HouseholdIncome?Environment tab to see all the variables and data sets in the R workspacels()rm()ls()
Population <- States$Population
summary(Population)
rm(Population)
Was 2008 a landslide? How many states did Obama win versus McCain?
R Code Hint:
Winner <- States$ObamaMcCain
summary(Winner)
Or you can just do:
summary(States$ObamaMcCain)
Winner?MyFirstScript.Rnrow(States)
ncol(States)
# symbolnrow(States) # number of rows
ncol(States) # number of columns
mean()median()mean(States$HouseholdIncome)
median(States$HouseholdIncome)
range()
sd()IQR()range(States$HouseholdIncome)
sd(States$HouseholdIncome)
IQR(States$HouseholdIncome)
hist()breaks hist(States$HouseholdIncome)
hist(States$HouseholdIncome, breaks=3)
hist(States$HouseholdIncome, breaks=15)
plot()plot(States$College, States$ObamaVote)
text() immediately after we use plot() we can add labels to the scatter plotlabels is used to specify the variable with the labels plot(States$College, States$ObamaVote)
text(States$College, States$ObamaVote, labels=States$State)
levels()To create a table of counts: table()
Try this R Code:
levels(States$Region)
table(States$Region)
W)?table()table(States$Region, States$ObamaMcCain)
W) and voted for McCain?plot()The height of the bars equals the number of observational units in each category (or level)
Try this R Code:
plot(States$ObamaMcCain)
mean(States$ObamaVote)
table(States$Region, States$ObamaMcCain)
[ , ] General format:
data[row, ]data[ , column]data[row, column]Ask yourself:
States.RData?data[row, ]View(States) # look for the 1st row
States[1, ]
States["Alabama", ]
data[ , column]View(States) # look for the 2nd column
States[ , 2]
States[ , "HouseholdIncome"]
data[row, column]View(States) # look for the 1st row and 2nd column
States[1, 2]
States["Alabama", "HouseholdIncome"]
c() function, which combines (or concatenates) a set of elementsStates[c(1, 5), ]
States[c("Alabama", "California"), ]
States[c(1, 5), c(3, 9)]
States[c("Alabama", "California"), c("McCainVote","College")]
What percentage voted for Obama in Mississippi compared to Massachusetts?
R Code Hint:
States[c("Mississippi", "Massachusetts"), c("ObamaVote")]
StatesHealth.dta
.dta extension)foreign() R packagetm package) to data visualzation (ggplot2 package) to data wrangling (dplyr package)Packages tab and click Installinstall.packages("package_name")
package_name is just the name of the R package in quotesinstall.packages("foreign")library(foreign)getwd()setwd()dir()read.dta()getwd()
setwd("C:/Folder/")
dir()
StatesHealth <- read.dta(StatesHealth.dta)
"C:/Folder/" should be changed to the location of the Stata data set on your computerView(StatesHealth)
head(StatesHealth)
tail(StatesHealth)
States.RData?.RData file, we can use save()dir()save(StatesHealth, file="StatesHealth.RData")
dir()
save.image()
save.image("Everything.RData").RData extensionBut we can the States data set as a Stata data set
Try this R Code:
write.dta(States, file="States.dta")
dir()
States.dta with Stata!.xlsx)
xlsxread.xlsx().sav)
foreignread.spss().xpt)
foreignread.xport()Did “healthier” states vote for McCain or Obama in 2008?
R Code Hint:
plot(StatesHealth$Obese, StatesHealth$ObamaVote)
text(StatesHealth$Obese, StatesHealth$ObamaVote,
labels=StatesHealth$State)
URL: https://compass-workshops.github.io/info/
Email List: Send an email to listserv@lists.princeton.edu with “Subscribe COMPASSWORKSHOPS” in the body and all other lines blank, including the subject
MyFirstMarkdown.RmdThe initial chunk of text contains instructions for R
title: "My First Markdown"
author: "Ethan Fosse (COMPASS Workshops)"
date: "February 13, 2018"
output: html_document
MyFirstMarkdown.Rmd:# loading the R data set
# load("C:/Folder/States.RData")
# examining the data set
head(States)
nrow(States)
ncol(States)
{r} and ends with 3 single back quotes**bold**, and you make things italics by using single asterisks, like this: *italics*.mean(States$ObamaVote)
{r}ObamaMcCain <- States$ObamaMcCain
plot(ObamaMcCain)
# load("C:/Folder/States.RData")
# so that R will run this line of code load("C:/Folder/States.RData") to reflect the correct location of States.RData on your computerload() specifies the appropriate location